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ABSTRACT 

The Gram-negative plant-pathogenic bacterium 
Xanthomonas campestris pv. vesicatoria (Xcv) is 
an important model to elucidate the mechanisms 
involved in the interaction with the host. To gain 
insight into the transcriptome of the Xcv strain 
85-10, we took a differential RNA sequencing 
(dRNA-seq) approach. Using a novel method to 
automatically generate comprehensive transcription 
start site (TSS) maps we report 1421 putative TSSs 
in the Xcv genome. Genes in Xcv exhibit a poorly 
conserved -10 promoter element and no consensus 
Shine-Dalgarno sequence. Moreover, 14% of all 
mRNAs are leaderless and 13% of them have un- 
usually long 5-UTRs. Northern blot analyses 
confirmed 16 intergenic small RNAs and seven c/'s- 
encoded antisense RNAs in Xcv. Expression of eight 
intergenic transcripts was controlled by HrpG and 
HrpX, key regulators of the Xcv type III secretion 
system. More detailed characterization identified 
sX12 as a small RNA that controls virulence of Xcv 
by affecting the interaction of the pathogen and 
its host plants. The transcriptional landscape of 
Xcv is unexpectedly complex, featuring 



abundant antisense transcripts, alternative TSSs 
and clade-specific small RNAs. 



INTRODUCTION 

At a staggering pace new high-throughput sequencing 
technologies have helped to unveil the transcription- 
al complexity of many organisms in all kingdoms of 
life (1-3). The recently developed differential RNA 
sequencing approach (dRNA-seq) has yet added a new 
perspective. dRNA-seq, based on a selective enrichment 
of native Spends, has been shown to accurately and 
cost-effectively identify transcription start sites 
(TSSs) and RNA processing sites for whole genomes 
(4). In addition to the obvious advantages for the 
analysis of 5'-UTR or promoter elements, dRNA-seq 
allows distinguishing independently transcribed short 
non-coding and coding RNAs from post-transcriptional 
processes such as maturation (4). However, a fully- 
automated method to annotate and statistically evaluate 
TSSs in large dRNA-seq data sets has been missing so far. 
Here, we sketch a procedure to automatically identify 
TSSs. 

Transcriptome analyses in plant pathogenic bacteria 
so far mainly focused on coding regions and the 
regulon controlling type III secretion [e.g. (5,6)]. A 
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recent deep sequencing analysis of Pseudomonas syringae 
identified many small RNA (sRNA) candidates, most 
of which, however, await validation by independent 
methods (7). 

The Gram-negative plant pathogenic y- 
proteobacterium Xanthomonas campestris pv. vesicatoria 
(Xcv) is the causal agent of bacterial spot disease on 
pepper and tomato and is of great economic importance 
in regions with a warm and humid climate (8). Xcv serves 
as a model system to elucidate the molecular communica- 
tion between plant pathogens and their hosts and to char- 
acterize bacterial virulence strategies. Genome analysis 
predicted 4726 open reading frames (ORFs) in the Xcv 
strain 85-10 (9), yet the overall gene structure and 
non-coding RNA output of this model pathogen are still 
poorly understood. 

Essential for pathogenicity of Xcv on susceptible host 
plants is the type III secretion (T3S) system, encoded by 
the hrp [hypersensitive response (HR) and pathogenicity] 
gene cluster (10). In Xcv, as in most Gram-negative bac- 
terial pathogens, the T3S nanomachine translocates a suite 
of effector proteins into the plant cell where they manipu- 
late host cellular processes to the benefit of the pathogen, 
e.g. by suppression of basal plant defense responses 
(9,11-13). hrp mutants do not grow in plant tissue, and 
they no longer cause disease in susceptible plants and the 
HR in resistant plants (10). The HR is a local, rapid 
programmed cell death at the site of infection, which 
coincides with arrest of bacterial multiplication in the 
plant (14,15). 

The T3S system is transcriptionally induced in certain 
minimal media and in the plant (16,17). Key regulatory 
proteins are the OmpR-type response regulator HrpG, 
which is activated by unknown plant signals and 
controls the expression of a genome-wide regulon 
including hrp, type III effector and putative virulence 
genes (16-19). HrpG-mediated activation of gene expres- 
sion depends in most cases on the AraC-type transcrip- 
tional activator HrpX (18), which binds to a conserved 
motif (plant-inducible promoter; PIP box) in the pro- 
moters of target genes (20). The identification of a point 
mutation in HrpG (termed HrpG*), which renders the 
protein constitutively active, was key for the analysis of 
T3S and the identification of putative virulence factors 
that are cotranscribed with the T3S system (19,21). An 
open question was whether virulence gene expression in 
Xcv is post-transcriptionally regulated, for instance 
by sRNAs. Here, we provide for the first time an in- 
sight into the transcriptional landscape of a plant patho- 
genic bacterium and the involvement of sRNAs in its 
virulence. 

MATERIALS AND METHODS 

RNA isolation for 454 pyrosequencing, RACE analysis 
and northern blot 

RNA was isolated from NYG-grown Xcv strains 85-10 
and 85* (exponential growth phase) by phenol extraction 
and treated with DNase I (Roche). For RACE and 
northern blot analyses, RNA was isolated from 



NYG-grown Xcv strains in exponential and stationary 
growth phases, as described (22). RACE analyses were 
carried out as described (23) with modifications [for 
detailed information see Supporting Information (SI)]. 
Northern blots were performed as described (24) using 
10 (ig RNA, 5-10pmol [y- 32 P]-ATP end-labeled 
oligodeoxynucleotides (Supplementary Table SI). 
Hybridization signals were visualized with a 
phosphoimager (FLA-3000 Series, Fuji). Northern blot 
hybridizations were performed at least twice with inde- 
pendently isolated RNA. 

Construction of cDNA libraries for dRNA-seq and 454 
pyrosequencing 

Prior to RNA treatment and cDNA synthesis, equal 
amounts of RNA from the two Xcv strains 85-10 and 
85* were mixed. dRNA-seq libraries were prepared ac- 
cording to Sharma et al. (2010) and sequenced with a 
Roche 454 sequencer using FLX and Titanium chemistry 
(see SI). 

Annotation of transcription start sites 

We aimed at the automated identification of TSSs based 
on the discrimination between narrow clusters of 
dRNA-seq reads that might represent a TSS and the dis- 
tribution of individual read starts. The density of read 
starts varies across the genome and can be modeled 
locally by a Poisson distribution with a parameter X. We 
used fixed-length intervals of size / to determine X r = s r /l 
from the number of read starts s r in the region r. The 
parameter X ave models the average genome wide arrival 
rate of read starts. X is defined as X r /X ave . The correspond- 
ing Poisson distribution F(k,X) describes the probability 
that at most k read starts are observed at a given 
genomic position. We used library 1 to determine X m for 
the background distribution of read starts. Similarly, 
library 2 was used to obtain X p to model the distribution 
biased towards the TSS. 

A TSS is defined as the genomic position at which the 
observed number of read starts in library 2 significantly 
exceeds the background distribution of read starts in 
library 1. The significance of a putative TSS was 
determined as follows: for each genomic position, the dif- 
ference of the number of read starts P in library 2 and M 
in library 1, D = P-M, was calculated. The difference of 
two Poisson distributed variables, D, follows a Skellam 
distribution (25) whose cumulative distribution function 
is given by 

F(D,X p ,X m ) = T J\ d \(2^Kn)\ deZ 

d=-oo VW 

where J\ d \ is the modified Bessel function of the first kind 
and integer order \d\. Furthermore, 1 — F(D,X p ,X m ) repre- 
sents the probability that a difference of at least D read 
starts is observed given the normalized rates of read starts 
X p and X m . To reduce the influence of window sizes and 
local variation of transcriptional activity a sliding window 
of size x was shifted by y nucleotides along the genome 
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and each site was tested t = x/y times for being a TSS. 
The /rvalue was obtained using the geometric mean 



N 



where p t denotes the P-value obtained in the i-th test. Note 
that only sites with a minimum expression of three read 
starts within a distance of <5 nt were tested. Furthermore, 
we excluded sites in the vicinity of perfectly aligned hit 
blocks, i.e. stacks of hits that all share a common 5'- 
and 3'-end. To determine X n we selected a region size of 
500 nt. For the sliding window approach an offset of 50 nt 
was used. All potential TSSs significant to the p = 0.05 
level are listed in Supplementary Table S2. In order to 
achieve a high positive predictive value for data sets of 
similar size, these parameters have been fixed globally in 
our study and may have to be adjusted for the application 
of the method to other data sets. 



Evaluation of the automated TSS annotation method 

To evaluate the predictive power of the automated TSS 
annotation method we used Helicobacter pylori and its 
manually curated TSS map (4) as reference. A data set 
of comparable size to the Xcv data set was generated. 
Reads overlapping with annotated tRNA or rRNA 
genes were excluded. From the H. pylori data set 40 385 
mapped reads of the treated library and 49 845 reads of 
the untreated library were randomly selected and con- 
tained 392 manually annotated TSSs which were used as 
reference class. TSSs were predicted using the same par- 
ameter settings (500 nt window size, 50 nt offset; 0.05 
^-value cutoff) as for the Xcv data set. 566 genomic pos- 
itions met the criteria for being TSS candidates, i.e. the 
clustering of at least three read starts. These positions rep- 
resent putative TSSs and were statistically evaluated with 
the automatic TSS annotation approach, according to 
(26). The results are summarized in an extended confusion 
matrix (Supplementary Table S9). 



Estimation of expression level 

To estimate the expression level of CDSs in Xcv likely to 
exhibit a proximal promoter, we selected 1276 annotated 
CDSs in a head-to-head arrangement. The set comprised 
549 CDSs with and 727 without annotated TSS. Due to 
the limited sequencing depth of our data set we combined 
reads of both libraries and evaluated the coverage of the 
first lOOnt of CDSs (Supplementary Figure S2). 

Detailed information about additional methods is 
provided in SI. 

Further supporting information and the raw sequencing 
data are available at the official institutional website of the 
University of Leipzig (http://www.bioinf.uni-leipzig.de/ 
publications/supplements/ 1 0-035). 



RESULTS 

Mapping of sequencing reads 

To analyze the primary transcriptome of Xcv, total RNA 
of strain 85-10 and its derivative 85* were mixed (SI and 
Supplementary Table SI). Xcv strain 85* carries a 
chromosomal point mutation in hrpG (hrpG*) leading to 
expression of the Hrp-regulon. cDNAs were synthesized 
from total RNA (untreated library; hereafter library 1) 
and RNA enriched for primary transcripts (treated 
library; hereafter library 2), respectively (4). dRNA-seq 
analysis resulted in 160 349 reads for library 1 and 
149 596 reads for library 2. A total of 84% of the reads 
were mapped to the Xcv genome using the program 
segemehl (27). As previously described, Xcv contains two 
identical copies of the 5S, 23S and 16S rRNA clusters, 
respectively, and 56 tRNA loci (9). A total of 63% of 
the reads of library 1 and 68% of library 2 reads 
mapped to these genes although the processed rRNAs 
and tRNAs were expected to be depleted in library 2. 
Closer examination revealed that the majority of 
tRNA-read starts in library 2 correspond to the 
presumed RNase P processing sites rather than TSSs 
(Supplementary Figure SI). To verify our observations 
we analyzed all reads overlapping tRNAs in the 
Helicobacter pylori dRNA-seq data set (4), which 
supports our findings (Supplementary Figure SI). The 
abundance of library 2 tRNA reads mapping to putative 
RNase P processing sites might be due to stable secondary 
structures formed after RNase P cleavage thus protecting 
mature tRNAs from exonuclease degradation. We, there- 
fore, discarded the reads mapping to rRNA and tRNA 
loci and analyzed the remaining 49 845 and 40 385 reads 
in more detail. While reads of library 1 cover entire genes, 
the read starts of library 2 are shifted towards the 5'-end of 
primary transcripts, which permits precise mapping of the 
TSS of a given gene (Figure 1A, e.g. XCV0520), as 
described (4). 



A statistical model to annotate TSSs 

Most of the TSS maps published to date are derived from 
tedious manual inspection of sequencing data (4,24,28) or 
using ad hoc heuristics complemented by manual inspec- 
tion (29-31). Here, we aimed at the automated identifica- 
tion of TSSs based on well-defined criteria, i.e. to 
discriminate between potential TSSs and the background 
distribution of read starts. This background, however, is 
not uniform across the genome but varies depending on 
gene expression levels. We therefore modeled read starts 
by Poisson distributions depending on the expression level 
in a well-defined genomic neighbourhood. Comparing the 
two libraries, a TSS is defined as a position where the 
observed difference of read starts in both libraries signifi- 
cantly exceeds the expected differences of read starts 
modeled by a Skellam distribution from which ^-values 
are readily derived (see 'Materials and Methods' section). 
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Figure 1. Identification of TSSs, promoter elements and analysis of S^UTRs. (A) Distribution of dRNA-seq reads in the chromosomal locus of Xcv 
85-10 spanning genes XCV0519 to XCV0524. Annotated CDSs and RNAcode high-scoring segments are highlighted in green and blue, respectively. 
Sequencing reads of library 1 (black) and library 2 (red) are shown on top for the (+)-strand and below for the (— )-strand. Predicted TSSs and 
corresponding classes are indicated in red. (B) Venn diagram illustrating the TSS classes. TSSs found maximal 300-bp upstream of coding sequences 
are classified as primary. Internal TSSs are found within and antisense TSSs on the opposite strand of genes (± 100 bp). Orphan TSSs do not belong 
to other classes. (C) Sequence analysis identified a T/A-rich promoter element for 1205 of 1421 putative TSSs. The histogram depicts the position of 
the conserved sequence pattern relative to the annotated TSSs at position +1. (D) 5'-UTR length distribution. The x-axis is split into linear (0-50) 
and logarithmic (51-300) scales. The top of the histogram gives the percentage of leaderless (<10bp), short (<50bp) and longer UTRs (>50bp). 



Annotation of TSSs 

In total, 1372 chromosomal and 49 TSSs on the large 
plasmid pXCV183 of Xcv (Figure IB and Supplementary 
Table S2) were identified. The data confirm TSSs 
determined previously for selected pathogenicity genes, 
e.g. hrcll and hrpBl (20,32). Nevertheless, the majority 
of TSSs annotated in our study should be considered as 
putative. TSSs were classified into four categories, i.e. (i) 
primary TSSs located up to 300 bp 5' of an annotated 
translation start, (ii) internal TSSs within an annotated 
coding sequence (CDS), (hi) antisense TSSs that map to 
the opposite strand of CDSs ± 100 bp and (iv) orphan 
TSSs that do not belong to the other three categories. 
Most of the annotated TSSs are primary TSSs (831) and 
probably correspond to the 5'-end of mRNAs. Overall, 
CDSs that lack an assigned TSS exhibit much lower ex- 
pression levels than CDSs with an annotated TSS (see 
'Materials and Methods' section and Supplementary 
Figure S2). 

As illustrated in Figure IB, TSSs can belong to more 
than one category, e.g. the assumed primary TSS of 
XCV0523 is also antisense to XCV0522 (Figure 1A). 
Interestingly, 10% (86/831) of primary TSSs are also clas- 
sified as internal. Thus, some neighboring CDSs previous- 
ly supposed to be cotranscribed as part of a polycistronic 
mRNA can also be transcribed from alternative pro- 
moters. As illustrated for XCV0522 (Figure 1A), we 



identified 71 putative TSSs which are located within the 
first 50 bp of annotated CDSs suggesting that previously 
annotated translation starts have to be revisited 
(Supplementary Table S3). Furthermore, 345 TSSs are 
located antisense to annotated genes. Interestingly, 41% 
of these TSSs are also classified as primary TSSs, including 
16 TSSs that correspond to overlapping mRNAs in an 
antisense orientation (Supplementary Table S2). 49 anti- 
sense TSSs are positioned in the 3 / -region (±100 bp) of 
annotated sense genes (Supplementary Table S4). In total, 
antisense reads map to 22% of all nucleotides that belong 
to annotated CDSs irrespective of read numbers, the 
presence of a TSS and the expression of the corresponding 
CDSs. The majority of these antisense reads lack automat- 
ically assigned TSSs and do not accumulate in clusters and 
thus, might not be originated from defined antisense 
genes. We also compared the sense- and antisense-read 
coverage of all annotated CDSs in Xcv and did not 
observe a correlation (data not shown). 

Most bacterial 5 °-dependent promoters contain 
conserved sequence elements, i.e. —35 (TTGACA) and 
— 10 (TATAAT) elements present in Escherichia coli 
(33). In Xcv, there is a weakly conserved T/A-rich motif 
in the proximity of —10 regions, however, other conserved 
promoter elements and a Shine-Dalgarno (SD) motif are 
missing (Figure 1C). This might be due to the high G + C 
content (65%) of the Xcv genome (9) and is discussed 
below. 
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Figure 2. Expression of selected Xcv sRNAs and antisense RNAs depends on HrpG and HrpX. Total RNA isolated from exponential (exp) and 
stationary phase cultures (stat) of (a) Xcv strain 85-10, (b) 85-10 expressing hrpG* from pFG72-l and (c) 85-10A/zr/?X carrying pFG72-l was 
analyzed by northern blot. Arrows and filled squares indicate signals corresponding to the expected full-length RNA and processing products 
obtained by transcriptome sequencing, respectively. The open square indicates the expected size of full-length asX4 determined by RACE 
analysis. The expected size of sX4 according to the sequencing data is marked by an asterisk. 5S rRNA (lower panel) was probed as loading control. 



Analysis of 5 -UTRs revealed unexpected size diversity 

The lengths of 5'-UTRs deduced from 831 putative 
primary TSSs range from 0 to > 300 bp, with the 
majority being between 10 and 50 bp (Figure ID). 
Surprisingly, 14% of the mRNAs (118 of 831) are leader- 
less, i.e. their 5'-UTR consists of <10bp with respect to 
the annotated genome sequence of Xcv (9). Many of the 
corresponding genes presumably have housekeeping func- 
tions (Supplementary Table S5). In addition, the 5'-UTRs 
of type III effectors were manually inspected. TSSs of 1 1 
described type III effectors from Xcv strain 85-10 (9,13) 
were mapped in this study (Supplementary Table S5). The 
promoter regions of nine effector genes contain a PIP box 
(consensus TTCG-N 16 -TTCG) (20). The assumed lengths 
of the 5'-UTRs of avrBs2, xopE2, xopJl and xopO are 
average. Curiously, the avrRxv mRNA is leaderless, and 
six mRNAs (avrBsl, xopAA, xopB, xopC, xopD and 
xopN) (9,13) contain unusually long 5'-UTRs, ranging 
from 173 to 678 bp. Consequently, the CDSs of some 
effector genes might be considerably larger than previous- 
ly described (9). Overall, 13% of the Xcv 5'-UTRs are 
unusually long (150-300 bp; Supplementary Table S5). 

Northern blot analysis confirmed 23 sRNAs in Xcv 

A computational scan for known RNA elements in Xcv 
identified already annotated tRNAs, rRNAs and the 
recently described ptaRNAl (34). In addition, we 
identified eight putative riboswitches and widely 
conserved RNAs, i.e. RNase P-, RtT-, SRP-, tmRNA 
and 6S-RNA (Figure 2 and Supplementary Table S6). 
Based on our dRNA-seq data, most of these transcripts 
were strongly expressed and TSSs were annotated for four 
of the housekeeping RNAs and five of the predicted 
riboswitches (Supplementary Table S6). The genes 
located downstream of the riboswitch candidates are 
either known to be involved in the respective riboswitch- 
controlled pathways in other bacteria or, as in case of 
yybP/ykoY candidates, presumably encode membrane 
proteins (35-37) (Supplementary Table S6). 



Prior to the automated TSS prediction, we selected 89 
sRNA candidates by manual inspection of the sequencing 
data with a focus on intergenic regions. We used northern 
blot analysis to experimentally validate sRNA candidates 
and analyzed their potential coregulation with the T3S 
system. To this end, RNA was isolated from exponential 
and stationary phase cultures of NYG-grown Xcv strains 
85-10, 85-10 expressing HrpG* and a derivative lacking 
hrpX (85-10 AhrpXphrpG*), respectively. Northern hy- 
bridizations confirmed 23 new sRNAs, whereas remaining 
candidates either appeared to correspond to longer tran- 
scripts, i.e. UTRs of mRNAs, or were poorly detectable. 
The latter can be explained by their low abundance in the 
dRNA-seq data (data not shown). 

After completion of bioinformatic analyses, seven 
verified sRNAs turned out to correspond to c/s-encoded 
antisense RNAs, termed asXl-7 (Table 1, Figure 2 and 
Supplementary Figure S3). We detected dRNA-seq reads 
mapping to both antisense RNA and mRNA for six of 
these transcripts and a few reads mapping to the CDS 
complementary to asX4, respectively (data not shown). 
The remaining 16 sRNAs mapped to intergenic regions 
and were termed sXl-15 and 6S (Table 1, Figures 2, 3 A, 
4 A and Supplementary Figure S3). Intriguingly, three 
sRNAs (sX15, asX6, asX7) are encoded on the large 
plasmid, two of which (asX7 and sX15) are in antisense 
orientation to each other (Table 1 and Supplementary 
Figure S3). Most sRNA genes were constitutively ex- 
pressed under the conditions tested, and appeared to ac- 
cumulate in stationary growth phase either due to higher 
transcription rates or increased stability, e.g. sX14 and 6S 
(Figure 2). Interestingly, expression/accumulation of five 
intergenic sRNAs and three antisense RNAs was affected 
by the key regulators of hrp gene expression, HrpG and 
HrpX, suggesting a role of these sRNAs or their targets in 
the interaction of Xcv with the plant. HrpX-dependent 
induction of sRNA expression was observed for asX4, 
sX5, sX8 (Figure 2) and sX12 (see below), whereas sXll 
appeared to be HrpG/HrpX-dependently repressed 
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Table 1. Verified sRNAs (sX) and antisense RNAs (asX) in Xcv 
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A: X. campestris pv. vesicatoria 85-10 (NC_007508) 

Ap: X. campestris pv. vesicatoria 85-10 plasmid pXCV183 (NC_007507) 

B: X. axonopodis pv. citri 306 (NC_003919) 

Bp: X. axonopodis pv. citri 306 plasmid pXAC64 (NC_003922) 

CI: X. campestris pv. campestris ATCC 33913 (NC_003902) 

C2: X. campestris pv. campestris 8004 (NC_007086) 

C3: X. campestris pv. campestris B100 (NC_010688) 

Dl: X. oryzae pv. oryzae MAFF 311018 (NC_007705) 

D2: X. oryzae pv. oryzae KACC 10331 (NC_006834) 

D3: X. oryzae pv. oryzae PX099A (NC_010717) 

E: X. albilineans GPE PC73 (NC_013722) 

Fl: Xylella fastidiosa 9a5c (NC_002488) 

F2: Xylella fastidiosa Temeculal (NC_004556) 

F3: Xylella fastidiosa M12 (NC_010513) 

F4: Xylella fastidiosa M23 (NC_0 10577) 

Gl: Stenotrophomonas maltophilia K279a (NC_010943) 

G2: S. maltophilia R551-3 (NC_011071) 

H: Burkholderia xenovorans LB400 (NC_007951) 

I: Acidovorax sp. JS42 (NC_008782) 

J: Bordetella petrii DSM 12804 (NC_010170) 

Kp: Ralstonia solanacearum CMR15 plasmid pRSC35 (FP885893) 

Lp: X. citri plasmid pXcB (AY228335) 
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(Supplementary Figure S3). In case of sX4 (Figure 2) and 
the antisense RNAs asXl and asX5 (Supplementary 
Figure S3) the sRNA stability appeared to depend on 
HrpG and HrpX as well as on the growth phase. 

Processing of sRNAs 

In general, the dRNA-seq data and northern blots suggest 
that Xcv sRNAs do not accumulate as primary transcripts 
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Figure 3. sX6 encodes a small protein. (A) Expression analysis of the 
sX6 transcript. Total RNA isolated from exponential (exp) and station- 
ary phase cultures (stat) of (a) Xcv strain 85-10, (b) 85-10 expressing 
hrpG* from pFG72-l and (c) 85-10 AhrpX carrying pFG72-l was 
analyzed by northern blot. The expected signal according to sequencing 
data is indicated by an arrow. 5S rRNA (lower panel) was probed as 
loading control. (B) Expression of the sX6 protein. Derivatives of Xcv 
strain 85-10 (wt) carrying promoterless empty vector pBRM-P (— ) and 
sX6-c-Myc expression construct, respectively, were grown to 
OD 6 oo = 0.7. Protein extracts were analyzed by immunoblotting using 
c-Myc epitope-specific and GroEL-specific antibodies. 



but undergo growth-phase dependent processing. 
However, in most cases the apparent sizes of full-length 
and processed sRNAs in northern blots were in agreement 
with the dRNA-seq data, e.g. sX8 and 6S RNA (Figure 2 
and Table 1). In addition to full-length and processing 
products, northern blots detected unexpectedly long 
signals, up to 900 nt, for the antisense RNAs asXl, 
asX2, asX3, asX6 and asX7 (Supplementary Figure S3). 
These signals may be caused by alternative termination of 
transcription. The sequencing data also suggest that sX7, 
sX13 and sX14 represent processing products of longer 
transcripts since reads mapping to these loci are predom- 
inantly found in library 1, and no TSS was identified in 
library 2 (Table 1). For selected RNAs the 5'- and 3 ; -ends 
were determined by RACE (Table 1). While the 5 ; -end of 
the antisense RNA asX4 is identical to the TSS identified 
by dRNA-seq, the 3 ; -region is 170nt longer suggesting the 
presence of a processing site. 

Phylogenetic distribution of sRNAs from Xcv 

While sX3 and asX5 are unique for Xcv, homology 
searches revealed that 10 sRNA genes are exclusively 
found in sequenced Xanthomonas species that encode a 
hrp-T3S system (Table 1). Four of the latter sRNAs, 
including sX12 described in more detail below, and asX5 
were coregulated with the T3S system. 

Two intergenic sRNAs, sXl and sXlO (Table 1; 
Supplementary Figure S3) are highly similar in sequence 
and structure. Three additional homologous genes are pre- 
dicted and expressed in Xcv and might therefore be con- 
sidered as an sRNA family. As three to six copies of 
members of this gene family are found in other 
Xanthomonas species (Table 1), we propose a functional 
redundancy of the respective sRNAs. 

Interestingly, 10 homologs of the plasmid-encoded and 
complementary Xcv sX15 and asXl genes are present in 
the chromosome of Stenotrophomonas maltophilia strain 
K279a (38) (Table 1). Moreover, asX6, which is also 
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Figure 4. sX12 is involved in virulence of Xcv. (A) sX12 is HrpX-dependently expressed. Total RNA isolated from exponential (exp) and stationary 
phase cultures (stat) of (a) Xcv strain 85-10, (b) Xcv expressing hrpG* from pFG72-l and (c) a derivative deleted in hrpX and carrying pFG72-l was 
analyzed by northern blot. The right panel shows a northern blot with RNA from (d) Xcv strain 85-10 and (e) an sX12 deletion mutant carrying 
empty vector pLAFR6, respectively, and (f) an sX12 deletion mutant ectopically expressing sX12 from psX12. The expected RNA size is indicated by 
an arrow. The asterisk denotes an unspecific signal. 5S rRNA (lower panel) was probed as loading control. (B) sX12 contributes to virulence and the 
HR. Strains used in (A) (right panel) were inoculated at a density of 1.25 x 10 8 CFU ml -1 into leaves of susceptible ECW and resistant ECW-10R 
pepper plants. Disease symptoms were photographed at 7 days post-inoculation (dpi). The HR was visualized by ethanol bleaching of the leaves at 2 
days post-inoculation. Dashed lines indicate the inoculation site. 
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located on pXCV183 of Xcv, is conserved in plasmids of 
X. axonopodis pv. citri strain 306 (39), Ralstonia 
solanacearum strain CMR15 (40) and X. citri (41) 
(Table 1). A rather erratic phylogenetic distribution was 
observed for sX8 since homologs are predicted in a small 
subset of the known genomes of both beta- and 
gamma-pro teobacteria (Table 1). Interestingly, this holds 
true also for the gene cluster upstream of sX8 which 
suggests a common evolutionary origin of this region. 
This type of phylogenetic pattern has in particular been 
observed for toxin/anti-toxin systems and suggests 
frequent horizontal transmissions (42). 

sX6 encodes a small protein 

Using RNAcode, a program, which was applied for the 
detection of novel protein coding genes in E. coli (43), 24 
potential short ORFs were predicted in the Xcv genome 
(see SI and Supplementary Table S7). dRNA-seq reads 
mapped to 12 of these loci. One example is sX6 (341 nt), 
which is constitutively expressed (Figure 3A) and has a 
predicted coding capacity of 80 amino acids including 
a signal peptide in the N- terminal region. We generated 
a translational fusion of sX6 with a C-terminal c-Myc 
epitope tag, under control of the native sX6 promoter 
(44), and introduced the expression construct into Xcv 
strain 85-10. As shown in Figure 3B, a fusion protein of 
the predicted molecular mass (~12kDa) was detectable in 
protein extracts of Xcv. 

Besides sX6, TSSs for two of the predicted ORFs with a 
coding capacity of 36 and 67 amino acids, respectively, 
were predicted (Supplementary Table S7). Interestingly, 
homologs of genes for the three small proteins are exclu- 
sively found in xanthomonads encoding a hrp-T3S system. 

sX12 contributes to virulence 

The fact that several sRNAs are expressed under control 
of the T3S system regulators suggested a possible role in 
virulence. Here, we focused on sX12 whose size of 78 nt 
was confirmed by 5'- and 3 / -RACE (Table 1). As men- 
tioned above, expression of sX12 is HrpX-dependently 
induced and accumulates in stationary growth phase 
(Figure 4A). To assess the contribution of sX12 to viru- 
lence we generated a deletion mutant derivative of strain 
85-10 (AsX12). While growth of strain AsX12 in planta 
was as wild-type (Supplementary Figure S4), plant reac- 
tions were altered. Disease symptoms in leaves of infected 
susceptible (ECW) and the HR in resistant (ECW-10R) 
pepper plants were delayed with strain AsX12 when 
compared to the wild- type (Figure 4B). The AsX12 
mutant phenotype was complemented by ectopic expres- 
sion of sX12 under control of its own promoter (Figure 4). 
We also performed T3S assays to analyze whether the 
delay in plant reactions by strain AsX12 might be due to 
reduced protein levels of T3S system components, e.g. the 
conserved apparatus component HrcJ, or the secretion 
of T3S substrates, i.e. the translocon protein HrpF. 
However, the detected protein amounts and the secretion 
of HrpF were comparable for the wild-type and the AsX12 
mutant (Supplementary Figure S4). 



DISCUSSION 

The dRNA-seq-based analysis of the Xcv transcriptome 
led to remarkable insights into the transcriptional land- 
scape of this important model plant pathogen and 
identified an sRNA with a role in virulence. In this 
study, we have devised a new method to automatically 
generate maps of TSSs for dRNA-seq data sets alleviating 
the need for manual inspection and allowing application 
of dRNA-seq also for larger genomes than Xcv. In 
contrast to earlier dRNA-seq approaches, mostly based 
on laborious manual inspection of sequencing data 
(4,24,28), the presented computational approach 
provides a measure of statistical confidence and ensures 
that predictions are comparable between different studies 
as demonstrated by our comparative analysis between 
manual and automated annotation of the previously pub- 
lished H. pylori transcriptome (4). While the sensitivity of 
82% demonstrates the method's capability of recovering 
manually annotated TSS at exactly the same position, a 
positive predictive value of 72% indicates its reliability 
(Supplementary Table S9). However, to dynamically 
adjust parameters such as significance levels the method 
remains subject to further research. We used only exact 
matches of the manual and automated TSS map for this 
analysis. The number of false positives and negatives 
might therefore be overestimated and suffers from biases 
introduced by manual inspection. Several parameters 
including window sizes to determine local expression 
levels, minimum coverage and significance thresholds to 
control for sensitivity and specificity have been fixed 
globally for this study. 

We annotated 1421 putative TSSs in Xcv 
(Supplementary Table S2) including riboswitches and 
genes for conserved housekeeping and novel sRNAs. 
Interestingly, 178 TSSs correspond to antisense transcripts 
including six that map to type III effector genes and to 
hrcC, which is transcriptionally induced by HrpX and 
essential for T3S and pathogenicity (Supplementary 
Tables S2 and S4) (10,45). The potential role of 
post-transcriptional regulation in Xcv is further supported 
by the finding that 22% of all nucleotides that belong to 
annotated CDSs are covered by antisense reads. 
Nevertheless, the majority of these reads might be 
derived from promiscuous transcription initiation as it 
was also suggested for E. coli (46). It remains to be 
clarified whether the identified antisense transcripts in 
Xcv represent functional gene products or the transcrip- 
tion itself has a regulatory function. 

We identified 831 putative primary TSSs, which were 
assigned to 17.35% of the 4726 annotated CDSs 
(Figure IB and Supplementary Table S2) (9). Similarly, 
in the archeon Methanosarcina mazei TSSs for ~20% of 
the CDSs were assigned (24). A considerably larger 
number of TSSs corresponding to 60% of the CDSs 
were recently mapped in the plant symbiont 
Sinorhizobium meliloti (31) and the human pathogen 
H. pylori (>50%) (4). This might be explained by the 
plethora of conditions analyzed and/or the higher 
number of sequencing reads and is supported by our 
finding that TSSs in Xcv are predominantly assigned to 
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CDSs with high expression level whereas CDSs without 
assigned TSS are generally weakly or not expressed 
(Supplementary Figure S2). 

In Xcv, the majority of 5'-UTRs appears to be <50bp 
(Figure ID), which is characteristic for bacteria (3). 
Surprisingly, there is no clear consensus sequence for 
ribosome binding. A recent study (47) analyzed the 
evolutionary process of translation initiation in prokary- 
otes and found that a SD-initiated translation in 
xanthomonads is unlikely. In good agreement with this, 
we identified an unexpected high number of leaderless 
mRNAs in Xcv (Supplementary Table S5) suggesting an 
alternative mechanism of ribosome guidance. In Xcv, tran- 
scription of 82% of the leaderless mRNAs starts with 
AUG which was shown to be essential for stable 
ribosome binding to these transcripts in E. coli (48). 

Unusually long 5'-UTRs as identified here for Xcv 
might be indicative of extensive post-transcriptional regu- 
lation, e.g. by sRNA-mediated modulation of mRNA 
translation or transcript stability. Since also 5'-UTRs of 
genes that encode type III effector proteins are unusually 
long (Supplementary Table S5), this might indicate a role 
of these 5'-UTRs in virulence. For instance, the genes for 
the type III effector proteins XopN and XopAA, shown to 
be important virulence factors of Xcv (49,50), comprise 
5'-UTRs of 173 and 477 bp, respectively (Supplementary 
Table S5). In H. pylori, mRNAs of genes involved in 
pathogenesis also carry long 5'-UTRs (4). 

Another potential implication of the high number of 
unusually long 5 -UTRs in Xcv is that the respective 
CDSs might be longer than predicted by the genome an- 
notation as shown recently for the type III effector protein 
XopD (51). On the other hand, a number of CDSs are 
presumably shorter than annotated, because 71 internal 
TSSs are located within the first 50 bp of annotated 
CDSs (Supplementary Table S3). We also identified 12 
expressed new loci with potential coding capacity 
(Supplementary Table S7) exemplified by sX6 that 
encodes an 80 amino acid protein (Figure 3). Hence, this 
study contributes to a first refinement of CDS annotation 
in Xcv. 

sRNAs represent important post-transcriptional regula- 
tors involved in a variety of processes such as quorum 
sensing (52) and virulence (53). In this study, the combin- 
ation of manual and automatic inspection of the cDNA 
sequencing data and northern blots verified 23 sRNAs in 
Xcv, seven of which represent antisense RNAs (Table 1). 
For six of the antisense RNAs we also detected expression 
of the complementary mRNAs. It should be noted, 
however, that our data do not allow distinguishing 
between cells that express both transcripts at the same 
time and cells that either express the mRNA or the anti- 
sense RNA. 

Notably, expression of five intergenic sRNAs and three 
antisense RNAs verified in this study was affected by the 
master regulators of Xcv virulence, HrpG and/or HrpX 
(Table 1) (16,18,19). Coregulation of sRNA expression 
with the T3S system clearly suggests a role of these tran- 
scripts in the interaction of Xcv with its host plant. As a 
proof-of-principle, we have demonstrated that sX12 con- 
tributes to virulence of Xcv (Figure 4B). Lack of sX12 does 



not affect bacterial growth inside the host and T3S, i.e. 
bacterial fitness is not impaired (Supplementary Figure 
S4). What might be the targets of sX12? Preliminary ex- 
periments did not reveal an effect of the absence of sX12 
on selected hrp (T3S) genes, i.e. transcript and protein 
accumulation was unaltered. Instead of regulating 
mRNA targets, sX12 might control gene expression in a 
different manner, e.g. by binding to proteins, DNA or 
metabolites. Furthermore, sX12 might impinge on the ef- 
ficiency of the T3S system, similar to the Salmonella 
typhimurium sRNA IsrJ which accumulates under infec- 
tion conditions. IsrJ positively contributes to invasion and 
effector translocation (54). 

After our analysis was complete, the identification of 
eight sRNAs in Xanthomonas oryzae pv. oryzae (Xoo) 
strain PX099A was reported (55). In agreement with 
our data, the Xoo sRNAs, Xoo3, XooA and Xoo6, repre- 
sent orthologs of the Xcv RNAs sX14, asX4 and sXl, 
respectively (Table 1). Contrary to XooA (55), which is 
145 nt, our analyses revealed that asX4 in Xcv is 309-nt 
long and encoded antisense to an annotated CDS. We 
also identified potential TSSs for the Xcv homologs of 
Xool and Xoo5, whereas Xcv lacks homologs of described 
bacterial sRNA genes except for housekeeping RNAs. 
Vice versa, the majority of sRNAs identified in Xcv is re- 
stricted to the genera Xanthomonas, Xylella and 
Stenotrophomonas (Table 1) and thus, reflects the current 
taxonomy (56). An estimation of the total number of 
sRNAs in Xcv is hampered by the relatively small 
number of sequence reads and the fact that, for 
example, TSSs of sRNA genes in the proximity 
(<300bp) of downstream CDSs are classified as primary 
TSSs (see sXl; Table 1). 

A remarkable finding of this study is the indication of 
frequent processing of Xcv sRNAs, which appears to be 
growth-phase dependent. In several studies, sRNA pro- 
cessing was shown to affect sRNA activity, e.g. GlmZ 
from E. coli which is cleaved and thus inactivated 
(57,58). The E. coli sRNA IstR-1 is rendered inactive by 
RNase Ill-dependent cleavage upon sRNA-mRNA inter- 
action (59). In contrast, MicX from Vibrio cholerae is 
stabilized by RNaseE-mediated cleavage which does not 
impair its interaction with target-mRNAs (60). Which 
ribonucleases are involved in processing of the Xcv 
sRNAs is not known. 

The analysis of additional knock-out mutants is needed 
to assess sRNA functions in Xcv. In case of virulence 
phenotypes, a challenge will be the identification of the 
targets. Besides possible effects of sRNAs on mRNAs 
the target can also be an RNA-binding protein. To the 
best of our knowledge, the only reported sRNAs 
involved in the regulation of virulence gene expression in 
plant pathogenic bacteria are members of the RsmB 
family which was studied in Erwinia carotovora ssp. 
carotovora. RsmB antagonizes the RNA-binding protein 
RsmA that acts as translational repressor (61-63). 
Although a major virulence function was reported for 
RsmA from X. campestris pv. campestris the interacting 
sRNAs are not known yet (5). The latter is complicated by 
the lack of CsrB/RsmB sequence homologs in 
xanthomonads. 
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